fiQO 


UNCLASSIFIED 

:.iSS  vji-  -N  Of  '-.s  fAGE 


‘  f 


,,  atPCSr  S6. >^^ASS  f  C^,|^|»f|Pr  J 

UNCLASSIFIED _ |  1 

fl*^C.-‘*SS-f  CATiON  ^ 


RETORT  DCrpUMENTATf 

•  Im  «6S 


rjN)  2  2  199} 

AD-A261  439  ^ 


3  O  STf 

Approved  for  public  release;  distribution 
Unlimited. 

S  VIC'»|’’0»»1NC  OBGANi/ATiON  ACPOAT  AlS> 

AE0SE-3  K-  ;>  j  M  4  2 


|64  same  jp  p€«<^CAm.sg  g«GAN!2at»o^  bo.  -Syodfoc  j  ?•.  samc  mon»to«»no  OAoANi2Ar»ON 

I  «...  —  ‘  I  // 4PA((fad(«i  I 


Harvard  University 

6c  ACOAgSS'Cifv  Stcet  j/id  ^/P  Co4S4* 

Department  of  Psychology 
33  Kirkland  Street 

Camhridgp,  n?13P _ 

M  NAME  CP  P,.NC‘NG  SP0NS0«‘NG 
C»Gasi2AT,cn 

AFSOR _ 

Ac  aC2A€SS  C  ifafc  ara  >.'oa« 

Building  410 

Bolling  AFB,  DC  20332-6448 


*  ^(..E  ;.o»i . 

Perception  and  the  Temporal  Prooerti 

•2  *£«SCSAl  Ai^-mCO  S' 


Air  Force  Office  of  Scientific  Research/NI 

7b  AOOAESS  >C.I>.  SiaM  a/id  ZtP  Cotti 

Building  410 

Bolling  AFB,  DC  20332-6448 


to  2PP  CE  S'^MSOl  9  Pf»OCuACMEf>4T  ^HST^XiUkjHt  OCNT ipiCATiON  iM^MSCA 


j  AFOSRX  #89-0461 

•3  scgAcV-SE-.f  oNOini;^  ^LQB^ 

^ocGAam  poo.<£C^ 

€  VEST  NC  SO 


'•ASA  -•> 

so 


6no2f 


’3o  ■'M6 

Final _ Renort _  facv  — Z, 

“vgAgC  '4  3ATe  A6POAT  -Vr  .Wo.O«>i  15  ^AOE  COUNT 

/8q  7/9,2  January  11,  1993 

'6  *■  SC*'A-  CN 

IT  -cSA'  icces 

'•  SwSaEC*'  ‘Csifirmu#  04  tf  nwctt^dfy  nmd  idgntify  6ir  Moc* 

f  e lo  '  gaolo  1" 

speech  perception,  prosody,  context  effects,  phonetic 

ns  nq 

segments,  fricatives,  stress 

i _ 

^9  AAS'^AAC'^.  »<OC« 

This  research  examines  the  interaction  of  acoustic  and  lexical  information  in  the 
identification  of  words  in  lexically  ambiguous  phoneme  sequences .  In  Experiment  1 , 
subjects  show  priming  for  the  meaning  of  a  large  word  like  ’’tulips"  when  presented  wit  i 
a  sequence  of  combinable  short  words  like  "two  lips."  In  Experiment  2  priming  is 
found  for  the  meaning  of  the  second  short  word  in  similar  sequences  (e.g.  "lips"  in 
"two  lips").  Finally,  Experiment  3  demonstrates  that  listeners  do  not  show  priming 
for  a  short  word  like  "lips"  when  it  is  pronounced  as  part  of  a  larger  word  like  ’’tuli)s." 
The  results  of  these  experiments  show  that  listeners  sometimes  access  words  other  than  those 
intended  by  speakers,  and  that  they  may  simultaneously  access  words  associated  with 
several  alternative  parses  of  ambiguous  sequences.  Furthermore,  they  suggest  that 
acoustic  marking  of  word  onsets  places  constraints  on  the  success  of  lexical  access. 

To  account  for  these  results,  we  present  a  new  model  of  lexical  access  and  segraentatic  i , 
the  Good  Start  model,  which  gives  a  principled  account  of  these  properties. 

'jo  'giSt'ai8UT  ON-ava,i.a8h.  0*  ASSTAaCT  Iji  abstract  JECuBITt  classification 


'.jNC_ASStP'£0  SAME  aS  »pt  C’''C  oStAS  — . 

22,  %AMC  C^«6Ss*CN5.9'..t 


\  Dr,  John  Tagney 
JO  DO  FORM  U73,  33  APR 

3^ 


UNCLASSIFIED 

2o  '’•EuEPMONf 

/'fciMdP  4 Cod0i 

(TMlUh  TOi 


GN  Gf  ■  A*.  '3  S  oaSd-fE 

19 


JJc  Of  f  iCf  SYMBOL 

a/  I  _ 

SEC-AiTy  C.  ASSif  iCAT:ON  Of  'Li-S  <*ACE 


■1 5  IT  19!B^ 


S'b^teaent  of  Work 

Work  on  this  project  will  extend  previous  work  on  the  context- 
dependent  nature  of  temporal  cues  to  the  identity  of  phonetic  segments, 
and  on  the  role  of  coarse-grained  asnects  of  the  speech  signal  in 
facilitating  segment  recognition.  These  extensions  will  address  the 
following  questions:  Do  adjacent  segments  exhibit  mutual  dependencies 
resulting  in  perceptual  ambiguity  that  can  be  overcome  by  contextual 
information  present  in  coarse-signal  characteristics?  Can  coarse¬ 
grained  aspects  of  the  speech  signal,  lacking  sufficient  information  for 
segment  identification,  convey  speaking  rate  independently  of  variation 
in  the  inherent  durations  of  the  underlying  segments?  Do  coarse-grained 
aspects  of  precursive  speech  contribute  contextual  information  that  is 
used  early  in  the  timecourae  of  segment  recognition?  Can  coarse-grained 
aspects  of  the  speech  signal  direct  attention  to  the  location  of 
upcoming  stressed  syllables? 

Word  on  the  project  will  directly  study  the  nature  of  coarse¬ 
grained  aspects  of  the  signal  and  their  relation  to  processing  the 
suprasegmental  temporal  aspects  of  speech.  New  techniques  will  be 
developed  for  creating  coarse-grained  representations  of  speech  that 
eliminate  information  about  segment  identity  but  preserve  prosodically- 
relevant  aspects  of  the  speech  signal.  These  techniques  will  permit 
control  over  degree  of  resolution  in  the  short-time  spectrum  of  speech. 
Perceptual  studies,  involving  direct  judgments  on  stimuli  with  varying 
amounts  of  spectral  resolution,  will  be  performed  to  determine  what  the 
amount  of  spectral  detail  that  is  necessary  for  perceiving  important 
temporal  components  of  prosody. 

As  part  of  the  project  a  computer  simulation  will  be  developed 
that  will  test  the  computational  adequacy  of  the  processes  that  are 
hypothesized  to  underlie  human  perception  of  the  temporal  properties  of 
speech.  This  model  will  address  three  related  issues:  the  segmentation 
of  speech  into  syllables,  the  use  of  temporal  relations  between 
syllables  to  generate  expectancies  about  the  temporal  properties  of 
upccxning  syllables,  and  the  contextual  modulation  of  feature  analyzers 
for  processing  temporal  cues  to  segment  identity. 


Status  of  Research 


Publications 

Gordon,  P.C.,  Eberhardt,  J.L,  &  Rueckl,  J.G.  (1992).  Aitentional  modulation  of 
the  phonetic  significance  of  acoustic  cues.  Cognitive  Psychology. 

Gordon,  P.C.,  Schaeffer,  C.P.,  &  Kennison,  S.M.  (1991).  Disambiguation  of 
segmental  dependencies  by  extended  phonetic  context.  Language  aiul 
Speech,  34(2),  157-176. 

Yaniv,  I.,  Meyer,  D.E.,  Gordon,  P.C.,  Huff,  CA.,  &  Sevald,  CA  (1990).  Vowel 

similarity  and  sjdllable  structure  in  motor  programming  of  speech.  Journal 
of  Memory  and  Language,  29, 1-26. 

Eberhardt  J.L.,  &  Gordon,  P.C.  (1990).  Effects  of  attention  on  the  perception  of 
phonologically  natural  forms.  Journal  of  the  Acoustical  S^ety  of 
America,  88,  Suppl.  1. 

Gow,  D.W.,  &  Gordon,  P.C.  (1990).  Perceptual  and  acoustic  measures  of  stress 
shift.  Journal  of  the  Acoustical  Surety  of  America,  88,  Suppl.  1. 

Eberhardt,  J.L.,  &  Gordon,  P.C.  (1989).  The  effects  of  attention  on  the  phonetic 
integration  of  acoustic  information.  Journal  of  the  Acoustical  Society  of 
America,  86,  Suppl.  1. 

Gow,  D.W.,  &  Gordon,  P.C.  (1989).  Two  paradigms  for  examining  the  role  of 
phonological  stress  in  sentence  processing.  Journal  of  the  Acoustical 
Society  of  America,  86,  Suppl.  1. 

Manuscripts  under  review 

Gow,  D.W.,  &  Gordon,  P.C.  Lexical  and  Prelexical  Influences  on  Word 
Segmentation;  Evidence  ft'om  Priming. 

Gow,  D.W.,  &  Gordon,  P.C.  Syllable  stress  in  the  processing  and  representation 
of  spoken  sentences. 


Lexical  and  Prelexical  Influences  on  Word  S^mentation:  Evidence  from  Priming 


David  W.  Gow,  Jr.  Peter  C.  Gordon 
Harvard  University 


Running  Head:  Lexical  Segmentation... 

Send  pagie  proofs  to: 

David  W.  Gow,  Jr. 
Department  of  Psychology 
Harvard  University 
33  Kirkland  St. 

Cambridge,  MA  02138 


£xical  S^mentation... 


2 


Abstract 

This  research  examines  the  kiteractlon  of  acoustic  and  lexical  information  in  the 
identification  of  words  in  lexically  ambiguous  phoneme  sequences.  The  cross-modal  lexical 
priming  technique  is  used  to  determine  which  word  meanings  listeners  access  at  the  offsets  of 
such  ambl^ous  sequences  when  diey  are  present  in  cminected  speech.  In  E3q>eriment  1, 
sub)ects  diow  priming  for  die  meaning  of  a  large  word  like  "tulips”  when  presented  with  a 
sequence  of  ccmibinable  short  words  like  ”two  lips".  In  Ejqieriment  2  priming  is  found  for  the 
meaning  of  the  second  short  word  in  similar  sequences  (e.g.  Tips”  in  ”two  lips”).  Finally, 
Eiqieriment  3  demonstrates  that  listeners  do  not  show  priming  for  a  short  word  like  "lips”  when 
it  is  pronounced  as  part  of  a  laiger  word  like  "tulips".  The  results  of  these  eiqieriments  show 
that  listeners  sometimes  access  words  other  than  those  intended  by  speakers,  and  that  they  may 
simultaneously  access  words  associated  with  several  alternative  parses  of  ambiguous  sequences. 
Furthermore,  they  suggest  that  acoustic  marking  of  word  onsets  places  constraints  on  the 
success  of  lexical  access.  To  account  for  these  results,  we  present  a  new  model  of  lexical  access 
and  s^mentation,  the  Good  Start  model  which  gives  a  principled  account  of  these  properties. 


cxical  Segmentation... 


3 


Printed  English  provides  readers  with  unambiguous  maiicers  ^aliich  segpment  sentences 
into  their  ccmiponent  words.  Spoken  lai^age  is  less  considerate;  words  usually  abut  one 
anodier  without  intervening  fences,  and  are  often  woven  togedier  through  coardcutation  to 
form  a  seamless  stream  ctf  acoustic-phonetic  information.  The  processs  of  lexical  access 
thereft>re  includes  an  element  of  lexical  segmentation,  listeners  must  either  directly  or 
indirectly  recognize  the  boundaries  of  the  words  they  identify.  A  number  of  schemes  for 
isolating  words  have  been  proposed.  These  schemes  can  be  broken  down  into  two  groups 
based  on  the  kinds  of  information  and  processes  that  they  emphasize.  One  group  emphasizes 
the  role  of  prelexical  perceptual  processes,  and  the  other  focuses  on  the  role  of  lexical  or 
contextual  processes. 

The  prelexical  approach  stresses  the  importance  of  identifying  acoustic  cues  that  may 
mark  word  boundaries.  The  basic  strategy  behind  this  approach  is  to  locate  word  boundaries 
without  the  aid  of  hypotheses  about  the  identity  of  the  actual  words  in  a  sequence;  listeners 
cmly  initiate  lexical  access  after  they  have  identified  a  word  onset  based  on  its  local  perceptual 
features.  It  follows  that  listeners  access  words  sequentially  when  processing  connected  speech, 
and  that  they  locate  the  word  boundaries  intended  by  the  speaker  regardless  of  the  wcn-ds  he 
or  she  chooses.  Work  by  Lehiste  (1960)  on  English,  and  Carding  (1967)  on  Swedish  identified  a 
numb'n’  acoustic  correlates  of  word  boundary  including  glottal  stops,  and/or  laryngealized 
voicing  at  the  onset  of  word-initial  vowels,  and  increased  aspiration  on  voiceless  stops.  Nakatani 
and  Dukes  (1977)  manipulated  tiiese  cues  in  forced  choice  listening  experiments  using  stimuli 
formed  by  editing  together  portions  of  pairs  of  words  like  "no  notion"  and  "known  ocean".  They 
found  diat  diese  acoustic  cues  strcmgly  influenced  the  way  listeners  segmented  tiieir  hybrid 
word  pairs.  However,  there  are  no  such  cues  associated  with  many  types  of  word  juncture,  and 
the  i^oncriogical  rules  for  maiidng  word  onsets  by  these  cues  tend  to  be  optional  in  most 
dkdects. 


exical  Segmentation... 


4 


Other  woric  has  focused  on  continuousty  varying  cues  such  as  duration,  amplitude  and 
{titch  that  tend  to  vary  as  a  fonction  of  stress,  and  which  also  tend  to  mark  word  onsets. 

Nakatani  and  Schaffer  (1978)  used  reiterant  speech  (repeating  a  syllable,  e.g.,  /ma/,  with  the 
prosody  of  a  word  or  phrase)  to  produce  lexically  ambiguous  phoneme  strings  with  minimal 
spectral  marking  of  word  boundaries,  but  normal  prosody.  They  found  that  these  prosodic 
manfoulations  influence  listeners’  lexical  segmentation  judgments.  Subsequent  experiments 
using  speech  resynthesis  to  manipulate  stress  correlates  independently  showed  this  was 
primarily  due  to  duration  and  rhythm  cues;  variations  in  amplitude  and  pitch  did  not  affect 
listener  segmentation  performance.  As  in  the  case  of  spectral  cues,  there  is  evidence  that 
duration  cues  are  not  a  completely  reliable.  Barry  (1981)  found  that  although  duration  cues  are 
present  in  citation  form  tokens,  they  are  far  less  pronounced  in  running  text. 

The  work  of  Cutler  and  her  associates  (Cutler  and  Carter,  1987;  Cutler  and  Norris,  1988; 
Butterfield  and  Cutler,  1988;  Cutler,  1990;  Cutler  and  Butterfield,  1992)  examines  the 
interaction  of  prelexical  and  lexical  processes  in  segmentation.  They  have  focused  on  the 
distinction  between  strong  syllables  (which  contain  foil  vowels)  and  weak  syllables  (which 
contain  reduced  vowels)  as  a  possible  prelexical  source  of  information  about  the  location  of 
word  boundaries.  Cutler  and  Norris  (1988)  showed  that  the  patterning  of  strong  and  weak 
s^lables  influences  subjects’  ability  to  detea  words  embedded  in  nonwords.  Cutler  and 
Butterfield  (1992)  found  that  both  natural,  and  laboratory-induced  lexical  segmentation  errors 
are  also  afieaed  by  this  patterning.  Cutler  and  Carter  (1987)  examined  a  large  database  of 
transcriptions  of  natural  speech  to  determine  how  reliably  vowel  quality  marks  word  onsets. 
They  found  that  74%  of  all  foU  vowels  are  found  in  the  initial  syllables  of  grammatical  words. 
Their  segmentation  algorithm,  the  Metrical  Segmentation  Strategy  (MSS),  capitalizes  on  this 
distribution  by  positing  separate  lexicons  for  lexical  and  grammatical  words,  and  initially  treating 
aU  syllables  with  foil  vowels  as  potential  lexical  word  onsets.  The  occurrence  of  a  foil  vowel 


dad  Segmentation... 


5 


triggers  segmentation  according  to  the  MSS,  causing  listeners  to  terminate  ongoing  look-ups 
and  begin  new  ones  starting  with  the  syllable  containing  the  full  vowel. 

The  asseititm  that  full  vowels  can  be  detected  to  trigger  segmentation  implies  that  vowel 
reduction  is  a  discrete  quality.  However,  Stetson  (1951)  notes  that  there  is  a  continuous  range 
hi  vowel  quality  between  schwa  and  full  vowels.  Moreover,  Fry,  Abramson,  Eimas  and  Liberman 
(1962)  have  ^own  that  vowels  are  perceived  along  a  continuum.^  This  suggests  that  listeners 
may  not  normally  make  clear  distinctions  between  full  and  reduced  vowels.  This  is  reflected  in 
English  orthography  vdiich  makes  no  distincdon  between  full  and  reduced  vowels. 

The  lexical  component  of  the  MSS  is  involved  in  verification  and  error  correction  It 
states  that  when  listeners  encounter  a  strong  syllable  they  access  the  longest  word  consistent 
with  the  input  beginning  with  the  previous  strong  syllable.  This  provides  for  strictly  sequential 
lexical  access  as  long  as  each  search  produces  a  valid  word.  When  no  candidate  words  are 
consistent  with  the  input,  the  MSS  calk  for  backtracking  to  access  shorter  words,  the 
continuation  of  an  ongoing  search,  or  the  reassignment  of  the  previous  stable  to  the  current 
input  and  the  initiation  of  a  new  search.  By  combining  these  lexical  and  prelexical  processes, 
the  MSS  provides  a  means  for  Usteners  to  sequentially  categorize  speech  sounds  into  words. 
Carter  and  Cutler  (1987)  tested  the  MSS  on  a  paragraph-long  passage,  and  found  that  it  lead  to 
die  successful  recognition  of  82%  of  all  words.  Briscoe  (1989)  compared  the  performance  of 
the  MSS  to  that  of  segmentation  strategies  in  which  lexical  access  k  initiated  with  every  new 
phoneme  or  sjilable  a  Ustener  encounters,  or  at  the  ofiset  of  every  word  that  k  recognized,  and 
found  that  it  generated  fewer  lexical  hypotheses  than  these  strategies  did  across  several  types  of 
phcMietic  transcription. 

In  summary,  there  k  evidence  that  Usteners'  segmentation  performance  is  sensitive  to  a 
variety  of  prelexical  cues  In  the  speech  s^nal.  However,  it  k  unclear  whether  these  cues  are 
reliaMe  enough  to  explain  our  af^iarent  fadUty  for  accurate  lexical  segmentation.  Spectral  cues 


cteal  Sc^entatkMi... 


6 


such  as  glottal  st(^,  aspiration  and  laiyngealization  are  generally  optional,  and  only  mark 
certain  types  of  wcvd  boundaries,  and  liiythmic  cues  are  diminished  in  connected  speech.  Even 
relativety  ctxisistent  forms  of  maridng  like  the  occurrence  of  strong  syllables  require  an 
dabcmte  {Htjcessing  algorithm  like  die  Metrical  Sc^entation  Strategy  to  account  for  good,  but 
imperfect,  lexical  segmentation  performance.  It  appears  then,  that  a  more  complete 
understanding  of  lexical  segmentation  rests  on  eidier  the  discovery  of  more  reliable  acoustic 
indicators  of  word  boundary  location,  or  the  identification  of  other  sources  of  boundary 
information. 

Lexical  inhumation  may  provide  alternative  routes  to  the  discovery  of  word  boundaries 
vdiich  avttid  any  reliance  on  special  acoustic  marking  of  word  boundaries.  Two  segmentation 
strategies  have  been  proposed  which  posit  no  role  for  acoustic  marking  of  juncture;  they  are 
premised  on  the  notion  that  lexical  access  leads  to  lexical  segmenution.  One  approach  argues 
that  words  are  recognized  sequentially,  and  that  listeners  locate  word  onsets  by  recognizing  the 
words  (and  thus  the  word  ofisets)  that  immediately  proceed  them.  Marslen-Wilson  and  Welsh 
(1978)  and  Cole  and  Jakimik  (1980)  have  proposed  lexical  access  models  that  allow  listeners  to 
identify  words  as  soon  as  enough  information  is  available  to  distinguish  them  from  all  other 
words  with  the  same  onset.  These  models  predict  tiiat  subjects  may  anticipate  a  word’s  ofiset, 
and  thus  die  location  of  the  onset  of  the  next  word,  when  this  point  occurs  before  its  offset. 
However,  Luce  (1986)  has  found  that  words  are  only  uniquely  defined  prior  to  their  ofisets 
rou^ily  40%  of  the  time  and  so  early  recognition  cannot  be  considered  a  reliable  straf  sgy. 
Gro^ean  (1985)  argues  that  in  these  cases  words  are  only  identified  after  their  ofisets.  He 
presented  sul^ects  with  gated  portions  of  sentences  like  "I  saw  the  boar  in  the  woods",  and 
found  that  they  were  only  certain  diat  they  had  heard  the  word  "boar*,  and  not  something  like 
"board"  or  "bom",  vdien  they  had  heard  a  portion  of  the  next  word.  Grosjean  also  used  this 
teduiique  to  examine  die  assumption  that  lexical  access  is  sequential.  He  presented  subjects 
with  shcHt  idirases  like  "bun  in  the  oven"  which  included  nouns  like  "bun"  which  could  only  be 


ical  S^mentation... 


7 


recognized  after  their  offsets,  followed  by  prepositions,  and  found  that  subjects  were  more 
likely  to  recognize  the  noun  and  the  preposition  at  the  same  time,  or  recognize  the  preposition 
beftn^  the  noun,  dian  they  were  to  recognize  the  words  in  sequential  order.  This  provides 
some  evidence  that  die  initiation  of  lexical  search  is  not  delayed  until  the  previous  lexical  search 
is  completed,  calling  into  question  the  idea  that  lexical  access  is  strictly  sequential.  However, 
Gro^ean’s  (1985)  gating  procedure  Is  an  off-Une  task  that  may  be  subject  to  strategic  effects. 
Therefore,  it  cannot  tell  us  definitively  if  word  recog;nition  is  strictly  sequential  in  the  real  time 
interpretation  of  connected  speech. 

Another  lexical  segmentation  strategy  that  depends  on  lexical  information  is  to  attempt 
to  access  every  word  vvMch  is  consistent  with  a  pattern  in  the  input.  This  involves  treating  each 
new  unit  of  speech^  as  a  possible  word  onset.  Unlike  the  strategies  described  above,  this 
approach  does  not  call  for  sequential,  word  by  word  lexical  access.  It  hinges  on  listeners’  ability 
to  cany  out  simultaneous  lexical  searches  associated  with  several  different  readings  of 
segmentation  ambiguities.  This  multiple  access  strategy  can  be  found  in  several  computer 
models  including  Klatt’s  (1980)  SCRIBER,  and  McClelland  and  Elman’s  (1986)  TRACE  model. 
Klatt’s  proposed  SCRIBER  system  identified  the  words  in  connected  speech  by  matching 
acoustic  input  against  sequences  of  templates  representing  portions  of  words  and  word 
transitions.  SCRIBER  simultaneously  evaluated  all  likely  sequences,  and  ultimately  selected  the 
best  <Hie  based  on  an  overall  measure  of  matching  between  the  input  and  the  templates  in  a 
sequence. 

The  TRACE  model  (McClelland  and  Elman,  1986)  also  uses  a  multiple  segmentation 
i^pfMoach,  and  only  produces  lexical  segmentation  as  a  byproduct  of  lexical  access.  TRACE  is  an 
interactive  activation  model  which  can  identify  words  in  "mock  speech"  consisting  of  idealized 
representations  of  acoustic  feature  values  at  different  points  in  time.  McClelland  and  Elman 
(1^6)  found  that  TRACE  was  able  to  correctly  identify,  and  thus  appropriately  segment,  almt^t 
all  of  tile  wirds  in  to  211  word  lexicon  vtiien  tiiey  were  presented  in  word  pairs  that  lacked  any 


sdcad  S^mentadon... 


8 


overt  boundary  marking.  However,  TRACE  does  not  segment  accurately  when  word  pairs  have 
the  same  sequence  of  phonemes  as  longer  words,  showing  a  strong  bias  for  recognizing  long 
words,  but  not  embeddable  short  words.  This  is  due  TRACE’S  mechanism  for  limiting  lexical 
access.  TRACE  uses  competition  to  identify  a  unique  segmentation  solution  when  word 
boundaries  are  ambiguous.  For  instance,  the  input  representation  /parti/  passes  activation  to 
die  nodes  which  represent  the  words  "par",  "tea"  and  "party".  In  turn  these  lexical  nodes  pass 
inhibition  to  all  other  lexical  nodes.  The  more  activation  a  lexical  node  receives,  the  more 
inhibition  it  can  pass  to  its  competitors.  Activation  is  a  function  of  the  amount  of  excitatory 
input  a  node  gets,  giving  longer  words  an  advantage  over  shorter  words  because  they  receive 
excitatory  input  from  a  larger  number  of  input  nodes. 

TRACE  demonstrates  that,  in  prindple,  multiple  access  approaches  to  lexical 
segmentation  can  yield  relatively  good  performance,  with  some  important  exceptions,  without 
any  reliance  on  acoustic  boundary  cues.^  It  is  important  to  remember  though  that  the 
segmentation  strategies  employed  by  TRACE  or  SCRIBER  are  motivated  by  design  constraints 
rather  than  psychological  data.  As  such,  they  must  be  considered  largely  untested  as 
psychological  models.  Fortunately,  the  TRACE  model  makes  strong  predictions  about 
segmentation  preferences  which  can  be  evaluated  using  human  performance  data. 

Goals  and  Research  Strategy 

The  goal  of  the  present  work  is  to  explore  die  relationship  between  prelexical  and 
lexical  factors  in  lexical  segmentation.  Prelexical  models  predict  that  lexical  segmentation 
should  lead  to  the  identification  of  only  those  words  associated  with  an  accurate  parse  of  a 
speaker’s  intended  utterance.  Lexical  approaches  like  TRACE  predia  that  segmentation  wiU 
tend  to  reflect  the  lencal  properties  of  potential  words  to  be  accessed  rather  than  the  intentions 
of  speakers.  Of  course,  neither  approach  is  likely  to  be  completely  correct.  It  is  likely  that 
lexical  s^imentation  processes  depend  on  a  combination  of  prelexical  and  lexical  information. 


ical  Segmentation... 


9 


In  order  to  understand  hot*;  listeners  use  these  types  of  information  to  identify'  word 
boundaries,  we  must  examine  two  fundamental  issues  about  processing.  First,  we  must 
determine  •whether  listeners  initially  perceive  word  boundaries  as  speakers  intend  them  to. 
Prelexical  segmentation  models  predict  that  listener's  should  generally  segment  input  in 
i^reement  with  speakers’  intentions,  ^ile  lexical  models  predict  that  listener’s  should  show  a 
bias  towards  perceiving  the  longest  potential  word  consistent  with  a  phoneme  string  regardless 
of  acoustic  features.  Second,  we  must  determine  whether  lexical  access  is  sequential  or  psurallel. 
Sequential  access  is  predicted  by  all  of  the  current  prelexical  models  of  segmentation  and  by 
some  lexica]  models,  while  non-sequential  multiple  access  is  predicted  by  models  like  TRACE 
and  SCRIBER. 

This  work  will  employ  the  cross>modal  lexical  priming  paradigm  (CMLP)  to  examine  the 
interaction  of  lexical  and  prelexical  influences  on  lexical  segmentation  and  thus  lexical  access. 
This  paradigm  has  been  used  extensively  to  study  the  pattern  of  lexical  access  elicited  by  the 
presentation  of  semantically  or  syntactically  ambiguous  speech  (Swinney,  1979;  Onifer  and 
Swinney,  1981;  Tanenhaus,  Leiman,  and  Seidenbeig,  1979).  In  the  CMLP  paradigm,  subiects 
generally  hear  a  word,  referred  to  as  the  priming  stimulus,  and  then  are  shown  a  sequence  of 
letters,  the  lexical  decision  probe,  which  they  must  classify  as  a  word  or  nonword.  Their  task  is 
to  indicate  by  a  button  press  whether  the  probe  is  a  word  or  not.  In  general,  subjects  are  able 
K)  respond  to  the  probe  more  quickly  when  it  is  semantically  related  to  the  prime.  By 
presenting  subjects  with  primes  that  are  somehow  ambiguous  and  then  showing  them  probe 
stimuli  that  are  related  to  different  interpretations  of  die  ambiguity,  researchers  can  determine 
which  meanings  listeners  access  at  a  given  point  in  time.  Three  experiments  will  be  presented 
which  use  a  CMLP  task  to  determine  what  meanings  are  accessed  when  subjects  are  presented 
with  one  or  two  words  consisting  of  lexically  ambiguous  phoneme  .strings.  Experiment  1  will 
direcdy  test  the  prediction  that  listeners  show  a  bias  towards  recognizing  long  words  when 
lexical  boundaries  are  ambigpous  at  the  level  of  the  phoneme  string.  Experiment  2  will  pit  any 


Lexical  Sedimentation... 


10 


iKTOustic  marking  of  word  boundaries  against  the  hypothesized  long-word  advanuge  to  see  if 
listeners  access  small  words  individuaity  when  they  can  be  combined  to  form  long  words. 
Finally,  Experiment  3  will  explore  listeners*  tendencies  to  recognize  embedded  words. 

Experiment  1 

In  this  experiment,  subfects  heard  sentences  like  "She  placed  her  two  lips  on  his  cheek" 
which  contained  two  «vord  sequences  such  as  "two  lips"  which  could  be  combined  to  form  a 
longer,  semantically  unrelated  word,  such  as  "tulips".  The  second  word  consisted  of  a  single 
strong  syllable  in  all  cases.  Several  lexical  models  of  segmentation  (McClelland  and  Elman, 

1986;  Swinney,  1981)  predict  that  these  sequences  should  produce  strong  priming  for  the 
longer  word  formed  through  concatenative  combination.  Conversely,  prelexical  models  predict 
that  normal  speech  contains  acoustic  boundary  cues  that  should  lead  listeners  to  correctly 
segment  these  sequences  into  two  parts,  and  avoid  accessing  the  concatenated  long  word.  Ihe 
Metrical  Segmentation  Strategy  (Cutler  and  Carter,  1987;  Cutler,  1990)  predicts  that 
concatenation  should  be  blocked  when  the  second  word  in  a  potentially  combinable  phrase  has 
a  full  vowel  in  its  first  syUable. 

Method 

Subjects.  Twenty  undergraduate  students  participated  in  a  single  45  minute  session,  and 
were  paid  between  $5.00  and  15.75,  based  on  their  performance  on  Uie  experimental  task.  All 
ci  the  subjects  were  native  speakers  of  American  English  with  no  reported  auditory  or 
(uncorrected)  visual  deficits. 

Stimuli.  The  priming  sequences  were  48  combinable  two-word  sequences  and  48  single 
wmrds  with  identical  phonemic  transcriptions  (e.g., .  "two  lips"  and  "tulips").  These  priming 
seqpiences  were  embedded  in  die  sixth  and  seventh  syllable  positions  cf  sentences  which  were 
identical  and  contextually  neutral  prior  to  the  primes.  The  single  word  primes  were  all 


Lexical  Segmentation... 


11 


mcHiomorphemic  words  with  first  syllable  stress.  The  two-syllable  primes  consisted  of  pairs  of 
<»ie  salable  words,  both  words  containing  full  vowels. 

The  sentences  were  read  aloud  by  one  of  the  authors  (DWG)  who  read  them  relatively 
quiddy  and  fluently  in  a  manner  that  sounded  fluent  and  spontaneous.  The  acoustic  maildng  of 
one  and  two-word  priming  sequences  was  manipulated  indirectly  via  the  speakers’  Intention  to 
pronounce  either  one  or  two  words.  These  two  types  of  priming  stimuli  provided  a  way  of 
manipulating  acoustic  boundary  marking  without  varying  phonemic  sequences.  This 
manipulation  allowed  for  the  presence  of  a  full  range  of  both  known  and  potential  unknown 
acoustic  segmentation  cues.  The  sentences  were  read  in  a  sound-attenuating  chamber  and  were 
recorded  on  a  digital  audio  recorder  at  a  sampling  rate  of  32  kHx  with  a  Shure  SM59 
microphone.  The  recordings  were  low  pass  filtered  at  10  kHz,  redigidzed  at  20k  and  equated 
for  amplitude.  The  offsets  of  priming  sequences  were  located  using  both  auditory  and  visual 
Inspection  of  digitized  waveforms  and  spectro^ams. 

The  lexical  decision  probes  were  single  words  and  pronounceable  nonwords.  Each 
priming  sequence  was  paired  with  a  legal  word  that  was  related  to  the  meaning  of  the  one- 
syllable  word.  Related  probe  words  were  selected  through  a  survey  in  which  fifteen  college 
students  were  asked  to  write  down  the  first  three  words  that  they  thought  of  after  reading  a 
sii^e  word  prime.  The  most  common  responses  were  seleaed  for  the  related  probe  condition. 
For  instance,  both  the  sequences  "tulips"  and  "two  lips"  were  paired  with  the  lexical  decision 
probe  "FLOWER".  Priming  sequences  were  also  paired  with  a  lexical  decision  probe  word  that 
was  unrelated  to  any  of  the  words  in  the  priming  sequence  (ex.  "two  lips"  and  "tulips"  were  both 
paired  with  the  probe  "GRAMMAR").  These  unrelated  probes  served  as  related  probes  for  other 
{Kimes.  Pronounceable  nonword  probes  were  constructed  to  match  the  CV  pattern  and  syllable 
structure  of  probe  words  used  in  the  task. 


Lexical  Segmentation... 


12 


Lexica]  decision  probes  were  presented  in  uppercase  letters  on  a  CRT  placed  roughly 
30"  away  from  subjects  at  eye  level. 

Procedure.  Subjects  were  seated  in  a  sound  attenuating  chamber  facing  the  CRT.  They 
wore  headphones  and  placed  two  fingers  from  their  dominant  hand  over  two  buttons  on  a 
computer  mouse.  Subjects  initiated  each  trial  by  pressing  either  mouse  button  after  being 
presented  with  a  prompt.  When  they  pressed  the  button  two  fixation  lines  briefly  appeared  on 
the  screen  to  draw  their  attention  to  the  position  of  the  upcoming  lexical  decision  probe.  When 
the  fixation  lines  disappeared,  the  subjects  heard  a  sentence  through  the  headphones.  At  some 
point  during  the  sentence  the  lexical  decision  probe  appeared  on  the  screen  and  subjects  had 
to  determine  if  the  probe  was  a  word  or  a  nonword.  In  experimental  trials,  lexical  decision 
probes  were  presented  immediately  at  the  offset  of  priming  sequences.  In  filler  trials,  probe 
position  was  varied  within  the  sentence  to  prevent  subjects  from  changing  their  processing 
strategies  in  anticipation  of  the  probe.  Subjects  were  instruaed  to  press  the  left  button  if  the 
probe  was  a  word,  and  the  right  button  if  the  probe  was  not  a  word.  The  sentence  stopped  as 
socm  as  a  button  was  pressed.  At  that  point,  subjects  were  to  repeat  everything  that  they  heard 
into  the  microphone.  The  repetition  task  was  employed  to  assure  that  subjects  attended  to  the 
sentences.  After  each  trial,  subjects  were  given  feedback  showing  how  quickly  they  responded, 
whether  they  were  correct,  and  how  many  points  that  they  earned  towards  a  performance 
bonus.  The  number  of  points  earned  for  correct  responses  was  based  on  the  sjjeed  of  the 
response.  Points  were  lost  for  incorrea  responses. 

Design.  There  were  four  cncperimental  conditions  defined  by  the  combination  of  two 
prime  conditions  (one-word  versus  matching  two-word  primes)  and  two  probe  conditions 
(related  versus  unrelated  to  the  one-word  prime).  An  individual  subjea  saw  each  of  the  48 
expoimental  prime  sequences  in  only  one  of  the  four  conditions,  and  across  subjects  each 
sequence  appeared  equally  in  all  four  conditions.  In  addition  to  the  experimental  trials,  each 
subject  ccnnpleted  72  filler  trials  in  ^Ich  lexical  decision  probes  were  unrelated  to  primes. 


Lexical  S^mentation... 


13 


Lexical  decision  probes  were  presented  at  dllferent  locations  in  priming  sentences.  The  trials 
were  broken  down  into  five  blocks  of  24  sentences.  The  fiist  block  contained  only  filler  trials  to 
allow  performance  on  die  task  to  stabilize.  Each  of  the  last  four  Mocks  included  three  trials  in 
each  of  the  four  eiqperimental  conditions.  All  blocks  contained  equal  numbers  of  word  and 
nonword  trials  in  the  lexical  decision  task. 


Table  1  shows  the  mean  responses  times  and  error  rates  in  each  of  the  four 
experimental  conditions.  Sub|ects  had  an  overall  accuracy  rate  on  experimental  trials  of  93% 
with  no  significant  differences  in  error  rates  between  experimental  conditions.  There  was  a 
significant  effect  for  relatedness,  Fj  (1,19)  =  12.7,  p  <  .005,  F2(l,47)  =  7.4,  p  <  .01,  with 
responses  to  related  lexical  decision  probes  faster  than  unrelated  ones.  However,  there  was  no 
significant  difference  between  triab  with  single  word,  and  concatenated  word  primes,  F]^(l,19) 

<  1,  F2(1,47)  *  1.3,  p  >  .05.  There  were  no  significant  interactions,  F^fl, 19)  <  1,  F2(l,47)  <  1. 

/insert  table  1  about  here/ 


Discussion 

The  results  of  Experiment  1  suggest  that  listeners  access  word  meanings  derived  from 
unintended  segmentations  of  strings  of  lexically  ambiguous  phonemes.  This  result  is  consistent 
widi  lexical  accounts  cff  segmentation  vdtich  predict  that  listeners  access  the  longest  word  diat  is 
consistent  with  a  phoneme  string.  This  lexical  bias  has  been  explained  in  two  ways.  Ihratherand 
Swinney  (1977,  dted  in  Swinney,  1981)  <^er  the  sequentialist  account  that  lexical  access  is 
guided  by  a  princifHe  cff  minimal  accretion  which  sutes  that  listeners  activate  the  longest  word 
diat  is  consistent  widi  input,  and  do  not  attempt  to  access  a  new  word  until  input  is  inconsistent 
wWi  the  of^oii%  lexical  search.  For  example,  when  hearing  the  word  "boycott*  they  have 
shown  that  listeners  tempcnarily  access  "boy"  but  do  not  access  "cot".  The  TRACE  model 


Lexical  Segmentation... 


14 


(McClelland  and  Elman,  1986)  offers  the  non-sequentialist  explanation  that  loi^  words  are 
favored  over  short  words  because  they  receive  more  bottom-up  activation.  As  they  receive 
m<ne  activation  they  are  In  turn  able  to  direct  more  inhibiticwi  to  competir^  short  words.  Both 
accounts  of  the  long  word  bias  attribute  it  to  lexical  factors,  and  minimize  the  role  of  i»elexical 
influences  on  lexical  segmentation. 

This  result  contradicts  the  Metrical  Segmentation  Strategy’s  prediction  (Cutler  and 
(barter,  1987;  Cutler,  1990)  that  the  full  vowel  in  the  second  word  of  two-wcwd  sequences 
should  trigger  s^mentation  and  prevent  listeners  from  accessing  the  word  formed  through 
concatenation.  According  to  the  MSS,  listeners  should  access  only  the  meanings  of  the  two 
shc»t  words.  The  priming  by  concatenated  word  meanings  suggests  that  lexical  lookup  was  not 
terminated  by  the  full  vowel  in  the  second  word  of  the  {Minting  sequences.  However,  the 
occurrence  of  syllaMes  with  full  vowels  may  still  play  a  role  in  lexical  segmentation.  This  role 
may  simply  be  masked  by  an  overwhelming  lexical  bias  towards  perceiving  long  words. 
Akemativefy,  it  is  {xissible  that  lexical  access  is  non-sequential,  and  that  lexical  access  is  irtitiated 
at  all  stixnig  syllables. 


Eaqpeiiment  2 

Experiment  1  suggested  that  listeners  show  a  bias  towards  i>erceiving  long  words  when 
ftiioneme  striii^  are  lexically  amttiguous.  This  result  almost  certainly  rests  on  a  lexical 
medianism.  However,  it  is  unclear  whether  this  mechanism  de{}ends  on  sequential  access  as 
suggested  by  several  models  (Marslen-Xi^dson  and  Webh,  1978;  Cole  and  Jakintik,  1980; 
Swinney,  1981;  Cutler  and  Caner,  1987),  cm*  ncMisequential  parallel  access  as  su^ested  by 
models  like  McClelland  and  Elman’s  (1986)  TRACE  model.  In  order  to  determine  if  lexical 
access  is  sequential  or  not.  Experiment  2  was  {serfcMrmed  to  determine  if  the  long-word  bias 
prevents  die  simultai^us  Kcess  of  aunbinable  short  words.  Subfects  were  presented  with 
one  word  priming  stimuli  in  contexts  allowing  concatenation  (e.g.,  "lips"  in  "two  Ups")  and  in 


Lexical  Sq^entatkMi... 


15 


contexts  that  do  not  allow  concatenation  (e.g.,  lips*  in  "warm  lips").  If  listeners  consider  several 
segmentations  in  parallel,  then  diey  should  show  both  priming  for  Icmg  words  formed  by 
cmicatenation,  and  for  the  short  words  that  make  them  up. 

Different  ncm>sequentialist  models  make  different  predictions  about  die  pattern  of 
priming  diese  stimuli  should  produce.  Such  multiple  access  models  attempt  to  access  words 
associated  with  both  appn^jriate  and  inappropriate  parses  of  a  speaker’s  intended  utterance, 
which  leads  to  the  consideration  of  more  lexical  candidates  dian  are  ultimately  necessary. 
Therefore,  diey  depend  on  mechanisms  that  limit  the  number  of  words  that  are  ultimately 
accessed.  Tlie  TRACE  model  does  this  before  lexical  candidates  are  folly  activated. 

Competition  between  lexical  candidates  leads  to  the  early  inhibition  of  words  that  are  not  folly 
activated.  In  their  TRACE  simulaticms,  Frauenfelder  and  Peeters  (1990)  have  demonstrated  that 
the  activation  of  wcxtis  embedded  in  the  second  syllaUe  of  longer  words  actually  sinks  below  its 
restii^  level  by  their  offset  as  a  result  of  the  long  wmd  bias.  Dierefore,  des^irite  its  non- 
sequentialist  access  strategy,  TRACE  predicts  that  only  long  words  should  be  accessed  when 
concatenation  is  possible,  which  suggests  that  we  should  see  no  priming  for  combinable  short 
wwds.  One  can  also  envision  a  multiple  access  approach  in  which  inapprc^nlate  word  meanings 
ne  eliminated  after  they  are  accessed  based  on  cmttextual  constraints.  Reddy  (1976)  and  Cole 
and  Jaldmik  (1980)  have  offered  evidence  diat  semantic  and  syntactic  context  can  constrain 
lexical  segmentation.  Frauenfelder,  S^ui,  and  Oijlatra  (1990)  examined  lexical  effects  in 
monitCHlf^  ta^,  and  found  evidence  for  lexical  facilitation,  but  not  Inhibition  (competiticMi). 
Without  lexiad  inhibiticm,  ncm-sequentialist  lexical  access  would  lead  to  the  access  of  all  words 
with  acoustic  patterns  cortespcMiding  to  the  input. 


Stdffects.  Thirty-two  subjects  from  die  same  pc^lation  as  the  previous  experiment  were 
paid  between  |$.00  and  $5.90  based  on  their  perfrwmance  to  serve  in  a  50  minute  experiment. 


Lexical  Segmentation... 


16 


SHmttli.  The  priming  sequences  were  24  two-word  sequences  in  which  the  combination 
of  the  tv^ro  words  formed  a  longer  word  (e.g.  "two  lips”),  and  24  two-word  sequences  in  which 
the  two  wcnrds  could  not  be  ccnnbined  to  form  a  longer  word  (e.g.  "warm  lips").  The  second 
word  in  each  combinable  pair  was  the  same  as  the  second  word  in  a  comMnable  one.  For 
example,  die  counterpart  m  the  combinaUe  words  "two  lips"  would  be  the  uncombinable 
sequence  "warm  lips*.  These  priming  sequences  were  incorpcvated  in  sentences  using  the  same 
OMistruction  and  recording  techniques  as  Experiment  1.  Some  tokens  from  the  first  experiment 
were  eliminated,  and  others  were  substituted  in  order  to  guarantee  that  the  second  words  in 
these  sequences  had  strong  associates.  One  third  of  the  combinable  stimuU  were  also  used  in 
Experiment  1. 

The  lexical  decision  probes  for  the  experimental  trials  were  chosen  on  the  basis  of  their 
association  with  the  meaning  of  die  second  word  in  the  priming  sequence.  For  example,  the 
sequences  "warm  lips”  and  "two  lips”  were  paired  with  the  related  probe  "KISS"  as  well  as  the 
unrelated  word  "BOAT". 

Procedure  tmd  Design.  Subjects  were  tested  using  the  same  procedure  that  was 
employed  in  Experiment  1.  The  design  was  also  similar  to  that  of  Experiment  1.  There  were 
four  eiqierimental  conditions  formed  by  the  combination  of  two  priming  stimulus  types 
(amilrinaUe  and  uncombinable)  and  two  probe  types  (related  and  unrelated).  Each  subject 
heard  each  priming  word  in  only  one  of  the  four  experimental  conditions.  There  were  a  total  of 
96  trials  inchidir^  24  experimental  triab  and  72  filler  trials.  These  trials  were  iM-oken  down  into 
fimr  blocks  24  trials  with  two  trials  from  each  of  the  four  experimental  conditions  in  each  of 
die  last  diree  blodcs.  Trial  order  was  different  for  each  subject  within  each  block. 

Bfiswhs 


Objects  performed  the  task  with  95  percent  accuracy  on  experimental  trials.  Responses 


with  reaction  times  gieater  than  3000  msec  0ess  than  %1  of  the  data)  were  eliminated  from 


Lcxkal  S^mentatkm... 


17 


analy^.^  Table  2  shows  the  mean  response  times  for  each  erf  the  four  experimental  conditions. 
There  was  a  main  effect  fc*’  relatedness  with  dedsiems  for  related  fMX»bes  faster  thsuri  decisions 
forunrdated|Mx>bes,  Fjfl.Sl)  -  12.5,  p  <  .001,  F2(1.23)  »  6.3,  p  <  02.  No  main  effect  was 
dbserred  for  the  combinaUlity  of  the  priming  stimulus,  Fi  (1,31)  <  1,  F2(l,23)  <  1. 
Furthermore,  diere  was  no  significant  interaction  between  relatedness  and  combinability, 
Fi(l,31)  <  1.  F2(1,23)  <  1. 


/insert  table  2  about  here/ 


Discussion 

The  results  of  Experiment  2  show  tiiat  words  produce  {Miming  when  they  are  in 
frfionemic  contexts  that  allow  concatenation,  and  tiiat  the  amount  of  (Miming  they  produce  is 
comfiarable  to  tiiat  found  when  they  are  (rfaced  in  contexts  ttiiich  do  not  allow  concatenation. 
Considered  in  isolaticMi,  this  result  suggests  that  Ji»eners  s^ment  lexically  ambiguous  phoneme 
strings  in  a  way  that  is  cemsistent  with  s(>eakers’  intentions  des|>ite  the  long-word  Mas. 

Taken  together,  die  results  of  Experiments  1  and  2  surest  that  listeners  simultaneously 
tuaxas  words  consistent  with  two  different  segmentations.  listeners  accessed  the  meanings  of 
tile  words  "tulips"  and  "lifis"  (and  (Mesumably  "two")  at  the  same  time  when  presented  with  the 
sequence  "two  lifis".  This  resuh  carries  the  imfXMtant  implication  that  w^ds  are  not  necessarily 
recognized  in  strict  sequential  order  as  suggested  by  several  modeb  (Marslen-Wilson  and 
Wdsh,  1978;  Ctrfe  and  Jaldmik,  1980;  Swinney,  1981;  Cutler  and  Carter,  1987;  Cutler,  1990). 

Eaqierlment  3 

The  purpose  (rfEiqieriment  3  is  to  determine  the  degree  to  vdiich  a  sfieaker’s  intention, 
and  thus  pronunciation,  of  a  (rfioneme  string  affects  listeners’  (lattems  of  lexical  access.  The 
resuks  oi  Eiqperiment  1  are  consistent  widi  a  long-wcMd  (lerceptual  bias  which  may  override 
prdcxical  s^mentation  cues.  However,  Exfieriment  2  showed  that  any  such  bias,  if  it  exists. 


exical  S^menution... 


18 


does  not  prevent  listeners  fiXMn  also  accessii^  short,  potentially  combinable  words,  that  a 
speaker  intends  pronounce.  Tbese  results  suggest  two  possibilities.  Hie  first  b  that  listeners 
access  evoy  word  v4iich  is  consistent  with  a  phoneme  or  syllable  sequence  in  an  input  without 
re^uxl  to  die  ^leaker’s  intended  prcMiundation.  Hie  odier  possibility  is  that  listeners  are  only 
aUe  to  access  wends  which  begin  widi  sequences  that  speakers  intend  to  be  word  onsets.  While 
die  results  Eiqieriment  1  show  that  listeners  may  access  words  that  speakers  do  not  intend 
them  to,  diese  words  do  b^in  with  sequences  that  die  i^peaker  intended  to  be  word  onsets. 
Experiment  3  examines  the  importance  of  speakers’  intention  to  pronounce  word  onsets  in 
lexical  access.  Specifically,  it  will  determine  if  the  ai^iarent  access  of  embeddable  words  in 
Experiment  2  was  a  (unction  of  ^leaker  intention. 

In  die  present  study,  listeners  heard  sentences  containing  two  sjilable  words  (e.g., 
"tulips”)  with  phoneme  sequences  which  are  also  consistent  widi  two  shorter  words  (e.g.,  "two 
l^is”),  and  we  looked  for  evidoKe  of  primii^  by  the  second  embedded  word  in  each  pair. 

Hiese  sequences  cemsist  of  die  same  phoneme  strings  that  served  as  priming  stimuli  in  the 
experimental  trials  in  Experiment  2,  but  diis  time  they  were  pronounced  with  the  intention  of 
sayii^  one  word  rather  than  two. 

Two  studies  that  are  ^milar’to  the  current  one  have  yielded  conflicting  results.  Prather 
and  Swinney  (1977,  cited  in  Swinney,  1981)  found  no  evidence  of  priming  by  words  embedded 
in  die  second  syllable  of  longer  words.  They  interpret  this  result  as  evidence  for  a  long-word 
bias.  However,  Shillcodc  (1990)  reports  evidence  that  embedded  wends  do  produce  priming, 
but  diat  its  magnitude  is  modulated  by  the  frequency  of  word  onsets.  This  suggests  that  the 
differences  in  results  between  the  two  eaqieriments  may  be  due  in  part  to  lexical  factors  relating 
to  the  fdioneme  string.  By  usii^  die  same  phoneme  sequences  in  the  current  eiqieriment  that 
were  used  in  Bqieriment  2,  we  hope  to  provide  a  comparison  between  the  two  experiments 
dutt  wfli  allow  us  to  cancel  oat  lexical  effects,  and  focus  on  the  impact  of  prelexical  factors  on 
subjects’  patterns  lexical  access. 


lexical  S^mentation... 


19 


M^o^s 

Subfects.  The  subjects  were  forty  undergraduate  students  drawn  from  the  same 
pofMilation  as  the  subjects  in  Ejqperiments  1  and  2.  They  were  paid  $5.00  to  15-50  based  on 
di^  perfrmnance  on  the  task.  Subjects  were  tested  individually  in  a  single  30  minute  session. 

SHmuH,  Procedure  and  Design.  The  auditory  primes  were  twenty-four  one-syllable 
words  that  i^tpeared  in  each  of  two  conditions.  In  the  embedded  condition,  phoneme 
sequences  equivalent  to  these  words  formed  the  second  syllable  of  a  two-syllable  word.  In  the 
unembedded  condition,  priming  words  were  preceded  by  a  one-syllable  word.  Thus  the 
priming  wiwd  "lips"  is  embedded  in  the  carrier  word  "tulips"  and  also  appears  unembedded  in 
die  sequence  "warm  lips".  These  priming  sequences  were  based  on  the  sequences  used  in 
E]q}eriment  2.  Each  combinable  priming  sequence  in  Experiment  2  (e.g.,  "two  lips”)  consists  of 
die  same  phoneme  sequence  as  a  carrier  word  in  this  experiment  (e.g.,  "tulips").  Furthermore, 
these  sequences  are  preceded  by  the  same  words  in  their  sentential  contexts  in  both 
experiments. 

The  auditory  priming  stimuli  were  presented  in  conjunction  with  visually  presented 
lexical  decision  probes.  Ihe  lexical  decision  probes  for  experimental  trials  were  individual 
words  which  were  eidier  related,  or  unrelated,  to  the  embedded  or  unembedded  prime  word. 
The  ^obes  used  in  this  experiment  were  identical  to  the  ones  used  in  Experiment  2.  Subjects 
were  tes^  using  die  same  procedure  and  design  that  was  employed  in  the  previous 
experiments. 


The  results  of  the  experiment  are  summarized  in  Table  3.  Responses  with  latencies 
gteater  dian  3000  msec  (less  than  %1  ctf  the  data)  were  deemed  outliers  and  excluded  from 
analy^  as  in  Experiment  2.  There  was  no  main  effect  for  embeddedness,  F](l,39)  <  1, 


jexkaf  Segmentation... 


20 


F2<1,23}  <  1.  Similarly,  there  was  no  main  effect  for  the  relatedness  of  lexical  probe  stimuli  to 
I»iming  stimuli,  Fi(l,39)  <  1,  F2(l,23)  <  1.  However,  there  was  a  significant  interaction 
between  embe^idedness  and  relatedness,  Fj[(l,39)  =  6.1,  p  <  .025,  F2(l,23)  *  11-5,  p  =  .002. 
Uimmbedded  priming  stimuli  produced  a  35  msec  {Miming  effect,  while  embedded  priming 
stimuli  firoduced  a  22  msec  reversal  of  the  standard  priming  effect.  Focused  comparisons  show 
that  the  priming  effect  is  significant,  Fi(l,39)  =  7.0,  p  <  .05,  F2(l,23)  ®  8.5,  p  =  .01,  while  the 
mhiUtion  effect  is  not,  Fi(l,39)  <  1,  F2(l,23)  *=  3-3,  p  >  .05. 

/insert  table  3  about  here/ 


rhscussion 

The  results  of  Ex{>eriment  3  surest  diat  embedded  words  were  not  accessed,  and  that 
listeners  (>arsed  lexically  ambiguous  phoneme  sequences  as  speakers  intend  them  to.  The 
results  of  Ex(>eriment  2  show  that  this  is  not  attributable  to  any  lexical  bias  since  the  same 
[Atmeme  sequences  (voduced  different  {lattems  of  priming  when  S{X>ken  with  different 
segmentations  in  mind.  The  contrast  between  these  results  implies  that  s{>eakers  mark  these 
sequences  difierendy  when  diey  s{)eak.  Furthermore,  they  show  that  this  marking  affects  the 
likelihood  of  successful  lexical  access. 

A  comparison  of  the  results  of  Ex|>eriments  2  and  3  suggests  that  acoustical  factors  play  a 
larger  rc^  lexical  s^mentation  and  lexical  access  than  is  suggested  by  Shillcock  (1990)  or 
Pradier  and  Swinney  (1977,  cited  in  Swiimey,  1981).  Prather  and  Swiimey’s  minimal  accretion 
fMlm^ite  {xediem  that  listeners  access  only  the  longest  {mtentia]  lexical  candidate  during  the 
percefition  wcxds.  Experiment  3  appears  to  confirm  this  prediction.  However,  the  results  of 
Experiment  2  suggest  that  some  form  o.  acoustic  marking  of  word  onsets  can  overcome  a 
fxitential  advanti^  for  long  wcxds.  Similarly,  ShiUcock  argues  that  the  sixe  of  the  cohort  of 
lexteal  candidates  defined  by  a  word  onset  determines  the  strength  of  {Miming  for  embedded 
word  meanif^.  Shillcock  suggests  that  listeners  attempt  to  access  all  embedded  words,  but 


deal  SegmentatkMi... 


21 


that  access  is  blocked  in  some  cases  by  inhibition  from  other  members  of  the  cohort. 
Experhnent  3  was  not  designed  to  examine  the  effects  of  cohort  size  or  lexical  frequency  and  so 
we  canned  address  hfr  results  directly.  Our  results  do  show  a  suggestive  trend  towards  the 
iidiibition  of  embedded  word  meanings,  but  it  is  not  statistically  significant.  Shillcock  also  found 
a  trend  towards  inhibition  in  some  cases.  That  trend  was  significant  by  subject  but  not  by  word. 
Given  the  equivocal  nature  of  this  evidence,  we  can  neither  fully  rejea  nor  accept  the  notion 
that  lexical  access  is  limited  by  interword  competition  effects.  However,  the  priming  of 
embeddable  word  meanings  found  in  Experiment  2  suggests  that  if  there  is  inhibition,  it  can  be 
overcome  by  acoustic  marking  of  word  onsets. 

The  results  of  the  current  experiment  also  have  bearing  on  the  question  of  whether 
lexical  access  is  initiated  at  all  sjdlable  boundaries,  or  only  at  boundaries  which  receive  some 
special  marking.  Ejqieriment  1  showed  that  subjects  access  words  other  than  those  intended  by 
speakers,  raising  the  possibility  diat  listeners  access  the  meanings  of  all  words  consistent  with 
phoneme  ot  syllable  sequences  in  input.  However,  Ejqjeriment  3  demonstrated  that  listeners 
do  not  access  the  meanings  of  words  which  begin  at  syllable  boundaries,  but  lack  the  special 
acoustic  marking  of  word  onsets.  This  could  be  taken  as  evidence  that  lexical  access  is  only 
initiated  at  acoustically  marked  onsets.  Alternatively,  it  could  be  argued  that  lexical  access  is 
initiated  continuously,  but  diat  successful  lexical  access  depends  on  the  marking  of  word  onsets. 

General  Discussion 

The  three  experiments  wfrich  we  have  presented  indicate  some  limitations  of  existing 
models  of  lexical  segmentation  and  lexical  access,  and  provide  the  impetus  for  a  new  model 
built  on  die  strei^ths  of  these  earlier  ones.  The  results  of  these  experiments  support  several 
conclusions  about  die  nature  lexical  segmentation  and  lexical  access.  The  first  is  diat  listeners 
simultaneously  access  the  meanings  of  words  associated  with  several  parses  of  lexically 
amt%u<nis  fdimieme  sequences.  The  second  is  that  whether  or  not  a  word  is  identified  in  such 


»ical  Se^entation... 


22 


a  sequence  depends  on  the  acoustic-phonetic  features  of  its  onset.  The  final  broad  point  is  that 
acoustic  marking  does  not  play  a  role  in  the  termination  of  a  lexical  search.  Furthermore,  this 
matching  b  not  stopped  by  the  identification  of  another  potential  word  onset.  In  this  section 
we  wiU  discuss  the  evidence  for  these  three  generalizations  in  the  context  of  a  new  model,  the 
Good  Start  mode!  of  lodcal  access  and  lexical  segmentation. 

The  Good  Start  r.iodd  states  that  lexical  access  is  ccmtinuous,  in  that  listeners  treat  each 
new  unit  of  input  as  a  potential  word  onset.  However,  because  speech  is  a  noisy  and  highly 
variable  medium,  listeners  are  most  likely  to  successfully  identify  potential  words  wiien  acoustic 
marking  in  the  form  of  long  phonemes  and  syllables,  and  unreduced  vowels  facilitates  the 
identification  of  potential  word  onsets.  Recognizable  word  onsets  get  lexical  access  processes 
to  a  good  start  by  activating  potential  lexical  hypotheses  that  can  supply  top-down 
information  to  aid  in  the  identification  of  the  segments  that  follow  them.  Furthermore,  our 
model  posits  that  lexical  search  is  exhaustive,  and  non-sequential,  so  simultaneous  searches  can 
access  all  word  meanings  consistent  with  phoneme  sequences  in  the  speech  signal  that  begin 
witii  die  acoustic  clarity  to  allow  access  to  get  off  to  a  good  start.  We  will  ai^e  that  this  model 
builds  on  the  strengths  of  other  models  of  lexical  access  and  segmentation  while  avoiding  some 
of  their  limitations. 

Our  first  ckim  of  the  Good  Start  model  is  that  several  lexical  searches  can  occur  at  the 
same  time.  This  is  a  basic  assumption  of  the  several  models  we  have  already  discussed  (Klatt, 
1990;  McClelland  and  Elman,  1986).  This  claim  is  supported  by  the  finding  in  Experiment  1  that 
concatenated  woid  meanings  produce  pr  'ming  and  the  finding  in  Experiment  2  that  the  same 
kind  oi  iwiming  stimuli  also  produce  fniming  for  the  meaning  of  the  individual  words  that  are 
comattenated. 

The  finding  that  listeners  access  words  associated  with  different  segmentations 
simultaneous  aigues  i^ainst  the  notion  that  words  are  accessed  in  strict  sequence.  Sequential 


ical  S^pmentation... 


23 


access  would  require  listeners  to  arrive  at  a  single  interpretation  of  ambiguous  sequences 
before  they  could  go  on  to  interpret  the  next  word.  This  means  that  upon  hearing  "two  lips" 
diey  would  have  to  decide  diat  they  heard  the  word  "two"  (and  not  the  beginning  of  the  word 
"tulips"  or  even  "toucans")  before  they  could  identify  the  word  "lips".  Our  finding  suggests  then, 
that  listeners  cannot  depend  on  the  strategy  that  some  have  suggested  (Marslen-Wilson  and 
Welsh,  1978;  Cole  and  Jaldmik,  1980)  of  identifying  word  boundaries  based  on  the  identification 
of  the  words  that  lead  up  to  them. 

It  is  not  entirely  surprising  that  we  should  find  evidence  for  simultaneous  multiple 
access.  Parallel  search  is  a  central  feature  of  most  current  psychologically  motivated  models  of 
lexical  access  (cf.  Marslen-Wilson,  1987;  Goldinger,  Luce  and  Pisoni,  1989;  McClelland  and 
Elman,  1986;  Grosjean  and  Gee,  1987).  It  is  generally  recognized  that  the  temporal  constraints 
of  real  time  speech  recognition  demand  listeners  to  pursue  many  lexical  hypotheses  in  parallel. 
While  this  degree  of  functional  parallelism  may  seem  ungainly  in  the  context  of  serial 
processing,  it  is  a  natural  outgrowth  of  the  structure  of  neural  and  neurally-inspired  processing 
architectures.  The  suggestion  that  listeners  access  words  associated  with  alternative  lexical 
segmentations  simultaneously  merely  adds  another  level  of  parallelism  to  the  s>^tem. 

As  a  multiple  access  model.  Good  Start  shifts  the  processing  burden  from  lexical 
segmentation  to  lexical  identification.  This  shift  has  the  important  implication  of  limiting 
listeners*  dependence  on  unreliable  acoustic  cues,  and  providing  a  potential  point  of  contact  for 
h^er  level  constraints  such  as  syntactic,  semantic  or  pragmatic  context  to  guide  performance. 
Consider  the  extreme  case  of  homophones  which  are  pairs  of  words  which  cannot  be 
distinguished  by  phonological  information.  Faced  with  this  semantic  distinction  without  an 
acoustic  difference,  it  appears  that  we  access  all  of  their  meanings,  and  eliminate  the  irrelevant 
meanings  later  based  on  contextual  constraints  (Swinney,  1979;  Onifer  and  Swinney,  1981; 
Tanenhaus,  Leiman,  and  Seidenberg,  1979).  The  strategy  of  multiple  access  provides  a  means 


Lexical  S<^inentation... 


24 


of  auidressing  the  constraints  imposed  by  the  temporal  and  acoustical  structure  of  the  speech 
signal. 


The  Good  Start  model’s  second  major  claim  is  that  lexical  access  depends  on  the 
{^ysical  characteristics  of  word  onsets.  We  believe  that  the  dominant  way  in  which  word 
"boundaries”  are  mailced  acoustically  is  by  making  word  onsets  more  intelligible.  This  facilitates 
lexical  access  by  giving  listeners  early  access  to  lexical  information  which  may  enhance  the 
perception  of  the  rest  of  the  word.  The  contrast  in  results  between  Experiments  2  and  3  shows 
diat  acoustic  maiidng  alSects  subjects’  patterns  of  lexical  access.  This  result  is  consistent  with 
the  wide  range  of  evidence  we  have  reviewed  supporting  a  role  for  acoustic  marking  in  lexical 
segmentation.  In  order  to  understand  this  role,  we  must  examine  the  dynamics  of  lexical 
access,  and  the  nature  of  this  acoustic  marking. 

As  we  suggested  earlier,  there  are  two  ways  to  interpret  the  role  of  acoustic  marking  of 
word  onsets  on  lexical  segmentation  and  lexical  access.  One  interpretation  is  that  lexical  access 
is  only  initiated  at  acoustically  marked  word  onsets.  Another  interpretation  is  that  lexical  access 
is  continuously  initiated,  but  that  it  is  more  successful  when  word  onsets  receive  special  acoustic 
marking.  The  first  approach  has  the  computational  value  of  limiting  the  number  of  searches 
that  are  initiated,  and  would  appear  to  simplify  compuution.  However,  it  also  carries  the 
ccmiputational  cost  of  requiring  listeners  to  identify  potential  word  boundaries  as  they  hear 
diem,  and  to  backtrack  if  they  make  mistakes. 

We  would  like  to  suggest  instead,  that  lexical  access  ns  initiated  continuously,  but  that  its 
success  depends  in  part  on  acoustic  marking  which  makes  word  onsets  more  intelligible.  By 
iidtiating  fasdod  access  continuously,  or  with  the  onset  of  each  new  unit  of  input,  listoiers  could 
avtdd  die  costs  of  backtracking  to  correct  missed  onsets.  This  strategy  would  require  listeners 
to  consider  a  larger  number  of  lexiod  candidates  in  order  to  identify  the  words  that  make  up  an 
umraiKe.  While  this  may  be  viewed  a»  a  considerable  computational  cost  in  the  context  of 


Lexical  Segmentatton... 


25 


models  \diich  depend  on  serial  processing  (Briscoe,  1989).  it  may  not  be  costly  in  the  context  of 
a  model  like  TRACE  (McClelland  and  Elman,  1986)  which  r^es  on  parallel  processing.  Given 
die  emerging  (Mthodoxy  that  parallel  processing  is  a  central  feature  of  human  cognitive 
function,  it  is  more  attractive  computationaliy  to  initiate  lexical  access  continuously  than  it  is  to 
initiate  lexical  access  only  at  likely  word  onsets.  Furthermore,  if  lexical  access  is  initiated 
continuously,  then  the  effects  of  acoustic  marldi^  cannot  be  viewed  as  a  mechanism  to  trigger 
segmentation.  In  order  to  understand  the  proper  role  of  acoustic  marking  in  lexical  access  and 
segmentation  we  must  examine  the  nature  of  this  marking. 

A  number  of  studies  have  examined  the  physical  properties  of  word  junctures.  There 
are  three  logical  places  to  look  for  juncture  markers:  at  word  offsets,  in  junctures  themselves, 
and  at  word  onsets.  Indeed,  Nakatani  and  Dukes  (1977)  have  found  evidence  that  a  small  set  of 
word  offsets  are  marked  by  allophonic  variation  of  the  phonemes  /!/  and  /r/,  but  such  marking 
does  not  appear  to  exist  in  general.  Junctures  themselves  are  also  marked  in  some 
circumstances  by  pauses  between  words.  This  is  conceptually  similar  to  the  juncture  marking 
used  in  several  TRACE  model  simulations  exploring  segmentation  (McClelland  and  Elman,  1986; 
Frauenfelder  and  Peeters,  1990).  Unfortunately,  pauses  are  generally  reserved  for  major 
grammatical  junctures,  and  are  therefore  of  litde  use  in  marking  juncture  within  clauses  or 
leases.  In  general,  there  is  little  evidence  that  acoustic  marking  of  word  offsets,  or  junctures 
ihemselves  plays  a  primary  role  in  guiding  lexical  segmenution  performance. 

It  is  striking  that  the  best  cues  for  juncture  all  seem  to  occur  within  the  first  syllable  of 
words.  Word  onsets  are  marked  by  features  such  as  aspiration  and  laryngealization  (Nakatani 
and  Dukes,  1977),  lengthening  of  onset  phonemes  and  s)ilabies  (Lehiste,  1972;  Oiler,  1973; 

Klatt,  1973;  Umeda,  1975;  Nakatani  and  Schaffer,  1978),  and  the  occurrence  of  lull  vowels 
(Cuder  and  Carter,  1977;  Cutler,  1990;  McQueen  and  Cuder,  1992).  This  kind  of  marking  Is 
counterintuitive.  If  one  where  deliberately  designing  die  speech  signal,  one  would  surely  mark 


Lexical  S^pnentation... 


26 


the  juncture  before  word  onsets,  the  way  spaces  marie  juncture  in  printed  test,  so  that  listeners 
would  not  have  to  backtrack  to  recognize  word  onsets. 

What  these  kinds  of  acoustic  maridng  appear  to  do  is  make  word  onsets  more 
intelligible.  While  researchers  have  paid  litde  attention  to  issues  directly  concerning  the 
intelligibility  of  word  onsets,  there  is  a  literature  examining  the  recognizability  of  stressed  versus 
unstressed  words  and  syllables.  This  is  relevant  because  duration  and  vowel  reduction  are 
acoustic  correlates  of  stress  as  well  as  word  onsets.  Bond  and  Games  (1980)  found  that 
stressed  syUables  are  rarely  misperceived  in  fluent  speech.  Similarly,  Kozhevnikov  and 
Chistovich  (1S>65)  found  that  stressed  syllables  are  detected  more  readily  than  unstressed  ones 
in  noisy  environments.  lieberman  (1967)  also  found  that  stressed  words  are  more  recognizable 
than  unstressed  words  when  they  are  excised  from  connected  speech.  Stressed  syllables  have 
often  been  described  as  "islands  of  reliability"  in  a  noisy  and  highly  variable  speech  stream.  The 
clear  implication  of  this  work  is  that  these  syllables,  with  their  lengthened  onsets  and  full 
vowels,  are  more  reliably  identifiable  based  on  acoustic  information  alone  than  are  other 
s)ilables.^ 

The  intelligibility  of  word  onsets  is  critical  in  lexical  access  because  word  onsets  provide 
the  basis  for  lexical  effects.  Several  studies  have  shown  that  lexical  effects  are  larger  at  word 
ofifrets  than  they  are  at  word  onsets  (Marslen-WUson,  1980;  Marslen-Wilson  and  Welsh,  1978; 
Gow  and  Gordon,  under  review).  This  can  be  eiq^lained  in  terms  of  the  sequential  nature  of  the 
speech  signal.  A  dear  word  onset  can  define  a  group  of  potential  lexical  candidates.  In  turn, 
di»e  candidates  provide  hypotheses  about  how  the  word  is  continued.  These  hypotheses  are 
the  source  of  top^lown  informaticMi  that  can  be  used  to  enhance  the  recognition  of  less 
intdl^ble  segments  at  the  ends  of  words.  The  cohort  model  (Marslen-Wilson  and  Welsh,  1978; 
Marslen-X^lson  and  Tjier,  1980;  Marslen-^SHlson,  1984)  depends  on  the  successful  recognition 
word  CMisets  to  define  the  cohort  of  lexical  candidates  that  will  supply  this  top-down 
information.  According  to  the  cohort  model,  lexical  access  cannot  occur  unless  word  onsets  are 


Lexical  S^mentation... 


27 


accurately  identified.  However,  Ganong  (1980)  has  shown  that  under  some  circumstances 
lexical  eflects  can  affect  the  perception  of  word  onsets.  The  TRACE  model  (McClelland  and 
Elman,  1986)  accounts  for  diis  effect  while  still  explaining  the  tendency  for  lexical  effects  to  be 
stronger  at  word  offsets  than  it  is  at  word  onsets.  In  TRACE  the  strength  of  lexical  activation  is  a 
function  of  the  degree  to  which  input  is  consistent  with  a  particular  lexical  candidate  overall. 
This  means  that  lexical  effects  do  not  depend  completely  on  the  identification  of  word  onsets. 
However,  TRACE  does  produce  stronger  lexical  effects  at  word  offsets  than  at  word  onsets  due 
u>  die  sequential  nature  of  the  speech  signal. 

To  summarixe,  we  believe  that  acoustic  maridng  of  word  onsets  should  not  be 
considered  a  segmentation  cue.  Instead,  we  believe  that  the  marking  of  word  onsets  has  the 
effect  of  making  them  easier  to  perceive.  Making  onsets  easier  to  perceive  gets  lexical  access  off 
to  a  good  start  by  activating  appropriate  lexical  candidates  that  can  provide  top-down  support 
to  help  listeners  perceive  the  rest  of  a  word. 

The  Good  Start  model’s  third  major  claim  is  diat  acoustic  marking  does  not  stop  lexical 
access.  This  claim  comes  directly  from  the  results  of  Eiqieriment  1  which  showed  that  listeners 
may  ccmcatenate  two  separate  words  to  access  the  meaning  of  an  illusory  larger  word. 
Experiment  2  demonstrated  that  listeners  perceive  two  words  in  this  situation,  and  so  this  is  not 
simfdy  a  matter  of  listeners  iKit  recognizing  the  juncture  in  these  lexically  ambiguous  phoneme 
strii^. 


This  claim  is  a  logical  ou^rowth  of  the  argument  that  lexical  segmentation  is  the  result 
of  lexical  access,  rather  than  a  prerequisite  for  it.  With  the  hypothesized  good  start  tiiat 
acoustic  onset  marking  affords  lexical  access,  top-down  {Hocesses  can  help  listeners  recognize 
sq^ents  at  die  end  of  words  which  reduces  the  necessity  for  special  marking  of  word  offsets. 
Furthermore,  die  GS  model  suggests  that  listeners  attempt  to  access  all  word  meanings  that  are 
comtistent  with  phcmeme  strings  in  die  input.  As  we  have  already  shown,  listeners  show 


Lexical  S^mentation... 


28 


simultaneous  activation  of  bo^  short  words  and  the  longer  words  they  can  form  through 
concatenation.  We  hypothesize  that  listeners  access  every  potential  word  formed  by  the 
{^Kmeme  sequence  which  fbUows  a  good  start  simultaneously,  and  then  suppresses  irrelevant 
words  after  the  fact  based  on  ctHitextual  constraints  as  in  the  processii^  of  homophones.  In 
this  way,  contextual  constraint  does  the  woric  that  acoustic  marldng  of  word  ofiBsets  (or  indirect 
marking  of  acoustic  offsets  via  the  maddi^  of  following  word  onsets)  has  been  hypothesized  to 
do  in  strictly  prelexical  accounts  lexical  segmentation.  This  view  is  supported  by  studies  by 
Cole  and  Jakimik  (1980)  and  Reddy  (1976)  which  show  that  lexical  segmentation  is  influenced 
by  semantic  context. 

In  conclusion,  we  would  like  to  surest  that  the  Good  Start  model  provides  the  outline 
of  a  psychologically  realistic  mechanism  to  account  for  the  patterns  of  lexical  access  shown  by 
listeners  when  presented  with  lexically  ambiguous  phoneme  strings.  It  makes  the  claim  that 
lexical  access  during  connected  speech  is  non>sequential,  and  that  alternative  parses  of  lexically 
amttiguous  phoneme  strings  are  simultaneously  accessed.  Furthermore,  it  emphasizes  the 
importance  of  acoustic  marking  of  word  onsets  which  may  give  listeners  eariy  access  to  reliable 
lexical  information  to  enhance  the  perception  of  words. 


Lexical  Segmentatkm... 


29 


References 

Bany,  WJ.  (1981).  Internal  juncture  and  speech  communication.  In  W.J.  Barry  and  ICJ.  Kohler 
(Eds.),  Beitrage  zur  e9q)erimenteelen  and  angewandten  pfaonetetik.  Kiel;  AlPUK, 
229-289. 

Bond,  Z.,  and  Games,  S.  (1980).  Misperceptions  of  fluent  speech.  In  R.  Cede  (Ed.),  l^trceptlon 
amd  productimi  of  fluent  speech.  Hillsdale,  Lawrence  Erlbaum,  115-132. 

Kisose,  E.J.  (1989).  Lexical  access  in  connected  speech  recognition.  Proceedings  of  the 

Twenty«Seventh  Congress,  Association  for  Computational  Linguistics,  Vancouver. 

Butterfield,  S.,  and  Cuder,  A.  (1988).  Segmentation  errors  by  human  listeners:  Evidence  for  a 

prosodic  segmentation  strategy.  I^oceedings  of  SPEECH  *88,  Seventh  Symposium  of 
the  Federatlcm  of  Acoustic  Societies  of  Europe,  Edinburgh;  VoL  3,  827-833- 

Cole,  R.,  &  Jakimik,  J.  (1980).  A  model  of  speech  perception.  In  R-  Cole  (Ed.),  Perception  and 
productitm  of  fluent  speech.  Hillsdale,  NJ:  Lawrence  Erlbaum,  133*163. 

Cutler,  A.  (1990).  Exploiting  prosodic  probabilities  in  speech  segmentation.  In  G.T.M.  Altmann 
(Ed.),  Cognitive  models  of  speech  processing:  Psjrchollnguistir  and  computational 
perspectives.  Cambrit^e,  MA;  MIT  Press,  105-121. 

Cutler,  A.  and  Butterfield,  S.  (1992).  Rhythmic  cues  to  speech  segmentation:  Evidence  from 
juncture  misperception.  Jouraad  of  Memmy  amd  Laaguage,  31,  218-236. 

Cuder,  A.,  and  Carter,  D.M.  (1987).  Hie  predominamce  of  strong  initiad  syllables  in  the  English 
vocabulary.  Computer  Speedi  amd  Languaige,  2, 133-142. 


Lexical  Segmentation... 


30 


Cutler,  A.,  and  Fear,  B.  (1991).  Categoricality  in  acceptability  judgements  for  strong  versus  weak 
vowels.  Proceediiiga  of  the  ESCA  Wmicahop  In  phonetics  and  phonology  of 
i^Nsaldng  styles,  Barceltma. 

Cutler,  A.  and  Norris,  D.  (1988).  The  role  of  strong  syllables  in  segmentation  for  lexical  access. 
Journal  of  Experiment  Psychtdogy:  Human  Perception  and  Perfcmnance,  14, 115' 
121. 

Frauenlelder,  U.H.,  and  Peeters,  G.  (1990).  Lexical  segmenution  in  TRACE:  An  exercise  in 
simulation.  In  G.T.M.  Altmann  (Ed.),  Cognitive  models  of  speech  processing: 
Psycholif^iuistlc  and  computational  perspectives.  Cambridge,  MA:  MIT  Press,  50'86. 

Frauenfelder,  U.H.,  Segui,  j.,  and  Eitjkstra,  T.  (1990).  Lexical  effects  in  phonemic  processing: 

Fadlatory  or  inhibitory?  Journal  (rf  Experiment  Psychology:  Human  Perception  and 
Performance,  16(1),  77-91. 

Fry,  D.B.,  Abramson,  A.S.,  Eimas,  P.D.,  and  Liberman,  A.M.  (1962).  The  identification  and 
discrimination  of  synthetic  vowels.  Langm^  and  Speech,  5, 171-189- 

Ganong,  W.F.  (1980).  Phonetic  categorization  in  auditory  wm-d  perception.  Journal  of 
Ejq;>erimental  Psychology:  Human  Perosption  and  Performance,  6, 110-115. 

Carding,  E.  (1967).  Internal  }nncture  in  Swedish.  C.W.K.  Gleerup:  Lund. 

Goldinger,  S.D.,  Luce,  PA,  and  Pisonl,  D.B.  (1989).  Priming  lexical  ne^bors  of  spoken  words: 
Effects  of  ctxnpetition  and  inhibition.  Journal  of  Memory  and  Language,  28,  501-518. 

Gow,  D.W.,  and  Gordon,  P.C.  (Under  review).  Coming  to  terms  with  stress:  Effects  of  stress 
location  in  sentence  {vocessing. 


Lexical  Segmentation... 


31 


Grosjean,  F.  (1985).  The  recognition  of  words  after  their  acoustic  offset:  Evidence  and 
im{^cations.  Perception  and  Psychophysics,  38,  299*310. 

Grosiean,  F.,  and  Gee,  J.P.  (1987).  Prosodic  structure  and  spoken  word  recc^nition.  Cognition, 
25. 135*155. 

Klatt,  D.H.  (1973).  Interaction  between  two  factors  that  influence  vowel  duration.  Journal  of 
the  Acoustical  Society  of  America,  54, 1102*1104. 

Klatt,  D.H.  (1980).  Speech  perception:  A  model  of  acoustic-phonetic  analysis  and  lexical  access. 
In  RA.  Cole  (Ed.),  Perception  and  production  of  fluent  speech.  Hillsdale,  NJ: 
Lawrence  Erlbaum,  243*288. 

Ko^evnikov,  V.,  and  Chistovich,  L  (1965).  Speech:  Articulation  and  perception.  US 
Department  of  Commerce  Translation.  IPRS  30,  543>  Washington,  D.C. 

Lehiste,  I.  (I960).  An  acoustic*phonetic  study  of  internal  open  juncture.  Phonetics  Suppl.  5- 

Lehiste,  I.  (1972).  The  timing  of  utterances  and  linguistic  boundaries.  Journal  of  the 
Acoustical  Society  of  America,  51, 2018*2024. 

lieberman,  P.  (1967).  Intonation,  perception,  and  language.  Research  monograph  no  38. 
Cambridge,  MA:  MIT  Press. 

Luce,  PA  (1986).  A  ccmipuutional  analysis  of  uniqueness  points  in  auditory  word  recognition. 
Perception  and  Psychophysics,  39(3),  155*158. 

Marslen-Wilson,  W.D.  (1980).  Speech  understanding  as  a  psychological  process.  In  J.C.  Simon 
(Ed.),  Sptrfcen  language  generation  and  understanding  Dordrecht:  Reidel. 


Lexical  S^mentadon... 


32 


Marsleii'Wilson,  W.O.  (1984).  Function  and  process  in  spoken  word-recognition.  In  H.  Bouma 
and  D.  Bouwhuis  (Eds.),  Attention  amd  performance  X.  Hillsdale,  NJ;  Lawrence 
Eribaum. 

Marslen-Wilson,  W.D.,  and  Tyler,  LK.  (1980).  The  temporal  structure  of  spoken  language 
understanding.  Cognition,  8, 1-71. 

Marslen-'W^lson,  W.D.,  and  Welsh,  A.  (1978).  Processing  interactions  and  lexical  interaction 
during  word  recognition  in  continuous  speech.  Cognitive  Psychology,  10,  29-63 

McClelland,  J.,  and  Elman,  J.  (1986).  The  TRACE  model  of  speech  perception.  Cognitive 
Psychology,  18,  1-86. 

McQueen,  J.M.,  and  Cutler,  A.  (1992).  Words  within  w^ords:  Lexical  sutistics  and  lexical  access. 
Proceedings  of  the  ICSLP,  vol.  1,  Banff,  Alberta,  221-224.. 

Mehler,  J.,  Dommergues,  J.Y,,  Frauenfelder,  U.,  and  Segui,  j.  (1981).  The  syllable's  role  in 

speech  segmentation.  Journal  of  Verbal  Learning  and  Verbal  Behavior,  20,  298-305. 

Nakatani,  L.H.,  and  Dukes,  KD.  (1977).  Locus  of  segmental  cues  for  word  luncture.  Journal  of 
the  Acoustical  Society  of  America,  62,  714-719. 

Nakatani,  LH.,  and  Schaffer,  J  A  (1977).  Hearing  ’words’  without  words;  Prosodic  cues  for  word 
perception.  Journal  of  the  Acoustical  Society  of  America,  63(1),  234-245. 

(Xler,  D.K.  (1973).  The  effect  of  position  in  utterance  on  speech  segment  duration  in  English. 
Journal  of  the  Acoustical  Society  of  America,  54, 1235-1247. 

Onifer,  W.,  and  Swinney,  DA  (1981).  Accessing  lexical  ambiguities  during  sentence 

ccmiprehension:  Effects  of  frequency  of  meaning  and  contextual  bias.  Memory  and 
Cognition,  9.  225-236. 


exkal  S^menutkm... 


33 


Ptecmi,  D.B.,  and  Luce,  D.B.  (1987).  Acoustic-phonetic  representations  in  word  recognition. 
Cognhkm,  25,  21-52. 

Prather,  P.,  and  Swinney,  DA  (1977).  Some  effects  of  synuctic  context  upon  lexical  access. 

Presented  at  a  meeting  the  American  Psychological  Association,  San  Francisco,  August 
26, 1977. 

Reddy,  R.  (1976).  Speech  recognition  by  machine:  A  review.  Preceedings  of  the  IEEE.  64,  501- 
531, 

Shillcock,  R.  (1990).  Lexical  hypotheses  in  continuous  speech.  In  G.T.M.  Altmann  (Ed.), 
Ct^nitive  models  of  speech  processing:  Psycholinguistic  and  computational 
perspectives.  Camtxidge,  MA:  MIT  Press,  24-49. 

Stetson,  R.H.  (1951).  MotcM’  phonetics.  Amsterdam:  North-Holland. 

Swinney,  DA  (1979).  Lexical  access  during  sentence  comprehension:  (Re)ccmsideration  of 
ctmtext  effects.  Journal  of  Verbal  Learning  and  Verbal  Behavior,  18,  645-660. 

Swinney,  DA  (1981).  Lexical  {MXKessing  during  sentence  comprehension:  Effects  of  higher 
cn-der  constraints  and  implications  for  representation.  In  T.  Meyers,  J.  Laver,  and  J. 
Anderson  (Eds.),  The  cognitive  representation  of  speech.  North-Holland. 

Tanoihaus,  M.,  Leiman,  J.,  and  Seidenberg,  M.  (1979).  Evklence  for  multiple  stages  in  the 

processing  of  ambiguous  words  in  syntactic  contexts.  Journal  of  Verbal  Learning  and 
Vetfoal  Behavior,  18, 427-441. 

Umeda,  N.  (1975).  Vowel  duration  in  American  English.  Journal  of  the  Acoustical  Society  of 
Amerke,  58, 434-445. 


exical  Segpnentation... 


34 


Author  Notes 

The  research  reported  in  this  paper  was  supported  by  grant  80-0416  to  Harvard  University 
(Peter  C.  Gordon,  Principle  investigator)  from  the  Air  Force  (MBce  of  Scientific  life  Researches 
Directorate.  We  thank  Julia  Lee  and  Matei  Mihalca  for  their  assistance  in  testing  subjects. 

Direct  <X}rrespondence  to  eidier  author  at  die  Department  of  Psychology,  Harvard  University, 
33  Kirkland  St.,  Cambridge,  MA  02138. 


exical  Sc^entation... 


35 


Table  1 

r.iean  reaction  times  (in  msec)  for  single  word  and  concatenated  word  prime  conditions  in 
E]q)eriment  1.  Accuracy  rates  are  shown  in  parentheses. 


Related 

Unrelated 

Probe 

Probe 

Prime  Condition 

(FLOWER) 

(GRAMMAR) 

Single  Word  (tulips) 

798  (.95) 

837  (.93) 

CcHicatenated  Word  (two  lips) 

812  (.93) 

839  (.93) 

»ical  Segmentation... 


36 


Table  2 

Mean  reaction  times  (in  msec)  for  combinable  and  uncombinable  prime  conditions  in 
Experiment  2.  Accuracy  rates  are  shown  in  parentheses. 


Related 

Unrelated 

Probes 

Probes 

Prime  Condition 

(KISS) 

(BOAT) 

Combinable  (two  lips) 

765  (.94) 

795  (.95) 

Uncombinable  (warm  lips) 

782  (.96) 

801  (.93) 

deal  Segmofitation... 


37 


Table  3 

Mean  reaction  times  (in  msec)  for  embedded,  and  unembedded  word  prime  conditions  > 
Experiment  3.  Accuracy  rates  are  shown  in  parentheses. 


Related 

Unrelated 

Probe 

Probe 

Prime  Condition 

(KISS) 

(BOAT) 

Embedded  {tulips) 

772  (.90) 

751  (.90) 

Unembedded  (warm  lips) 

760  (.92) 

796  (.94) 

Lexical  Segmentation... 


38 


Footnotes 

1.  Cutler  and  Fear  (1991)  argue  that  listeners  make  a  cat^orical  distinction  between  fuU  and 
reduced  vowek.  They  found  that  listeners’  judgements  of  the  naturalness  of  word  tokens 
produced  through  a  cross-splidng  technique  were  influenced  by  changes  in  vowel  quality  but 
not  in  stress.  This  demonstrates  the  relative  importance  of  vowel  quality.  However,  it  does  not 
demonstrate  that  vowel  quality  is  perceived  categorically  since  the  only  vowel  quality  distinction 
made  was  between  foil  and  reduced  vowels. 

2.  There  is  a  laige  empirical  literature  dedicated  to  the  question  of  wiiat  sublexical  units,  if  any, 
are  identified.  The  proposed  units  include  the  phoneme  (Pisoni  and  Luce,  1987),  diphone 
(Klatt,  1980),  and  sjdUable  (Mehler,  Dommergues,  Frauenfelder  and  Segui,  1981).  The  basic 
segmentation  strategy  that  we  are  discussing  could  be  instantiated  using  any  of  these  units. 

3.  McQelland  and  Elman  emphasize  TRACE’S  ability  to  perform  relatively  accurate 
segmentation  using  a  highly  simplified  input  representation  lacking  any  representation  of 
potential  pre-lexical  boundary  cues,  or  supralexical  contextual  constraints.  However,  they 
surest  diat  TRACE’S  performance  would  be  improved  by  the  inclusion  of  these  factors.  Gioth 
McClelland  and  Oman  (1986)  and  Frauenfelder  and  Peeters  (1990)  have  performed  TRACE 
simulations  using  markers  in  input  representations  to  guide  segmentation.  The  use  of  these 
madcers  does  allow  TRACE  to  overcome  the  long  word  bias  and  avoid  concatenation  errors.  It 
should  be  noted  though  that  these  markers  guide  segmenution  by  changing  the  alignment  of 
the  input.  Lexical  access  in  TRACE  depends  on  a  strict  alignment  of  input  over  time.  Each 
iexteal  node  is  activated  by  input  from  nodes  representing  phonemes  in  a  particular  temporal 
alignment.  For  instance,  a  node  representing  the  word  "cargo'*  would  receive  input  from  nodes 
representing  the  phonemes  /k/,  /a/,  /r/,  /g/,  and  /o/  in  sequence.  The  insertion  of  a 


Lexical  S^mentation... 


39 


boundary  marker  between  the  jtj  and  the  /g/  would  shift  the  second  syllable  out  of  alignment, 
and  {H-event  the  nodes  representing  /g/  and  joj  from  sending  activation  to  the  node 
representing  "cargo".  This  would  leduce  the  amount  of  activation  that  "cargo"  received,  and 
thus  reduce  the  amount  of  inhibition  it  could  direct  towards  the  node  representing  the  word 
"go".  While  this  mechanism  leads  to  accurate  segmentation  in  the  TRACE  model,  i'  is  probably 
an  unrealistic  way  of  explaining  human  segmentation  performance.  Unlike  TRACE,  human 
listeners  show  flexibility  in  their  ability  to  recognize  words  given  variability  in  their  temporal 
alignment.  Furthermore,  this  kind  of  marking  is  the  e(.^uivalent  of  placing  pauses  between 
words,  and  real  speakers  generally  do  not  do  this.  It  appears  then  that  TRACE  provides  no 
realistic  mechanism  for  utilizing  acoustic  information  in  lexical  segmentation. 

4.  The  same  trimming  criteria  were  used  for  ail  three  experiments.  Incorrect  responses,  and 
responses  with  reaction  times  greater  than  3000  msec  were  excluded  from  all  analy;-  .  No  trials 
were  excluded  from  the  analysis  of  Experiment  1  on  the  basis  of  reaction  time. 

5.  Certain  optional  acoustic  boundary  markers  like  laryngealization  and  glottalization  do  not 
appear  to  make  word  onsets  more  intelligible.  However,  they  may  facilitate  syllabification, 
which  would  be  a  vital  preiexical  process  if  syllables  are  identified  prior  to  words. 


